Avatar of Shangeth Rajaa

Shangeth Rajaa

Anyreach AI

Senior ML Scientist working on Voice AI, Turn-Taking, Full-Duplex Spoken Dialogue Systems, and Multi-Modal Speech LLMs.

Resume
  • About
  • CV
  • Publications
  • Blog

#voice ai

Content tagged with "voice ai"

DualTurn: Learning Turn-Taking from Dual-Channel Generative Speech Pretraining
2026-03-09 Shangeth Rajaa Interspeech 2026 (Accepted)
#Voice AI #Turn-Taking #Spoken Dialogue #Speech LLM

Dual-channel generative pretraining for learning natural turn-taking in spoken dialogue without labeled data. A 0.5B model that outperforms models 6x its size on turn prediction.

View
Speech LLMs for Conversations
2024-05-09
#Voice AI #Speech LLM #Conversational AI

A multimodal speech LLM that processes audio directly to enhance conversational AI while reducing overhead compared to traditional ASR-LLM-TTS pipelines.

View
Improving End-to-End SLU Performance with Prosodic Attention and Distillation
2023-08-20 Shangeth Rajaa Interspeech 2023, pp. 1114–1118
#Voice AI #Spoken Language Understanding #Prosody #Speech

Two techniques for incorporating prosody into end-to-end SLU: prosody-attention and prosody-distillation. Up to 8% intent classification accuracy improvement on SLURP.

View
Improving Spoken Language Identification with Map-Mix
2023-06-04 Shangeth Rajaa, Kriti Anandan, Swaraj Dalmia, Tarun Gupta, Eng Siong Chng ICASSP 2023 — IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1–5
#Voice AI #Speech #Language Identification #Data Augmentation

Map-Mix: a data augmentation approach using model training dynamics to guide latent mixup sampling, giving ~2% weighted F1 improvement on low-resource dialect classification.

View
Skit-S2I: An Indian Accented Speech to Intent Dataset
2022-12-26 Shangeth Rajaa, Swaraj Dalmia, Kumarmanas Nethil arXiv preprint arXiv:2212.13015
#Voice AI #Spoken Language Understanding #Dataset #Speech

The first public Indian-accented SLU dataset in the banking domain. SSL speech representations beat ASR-based approaches for intent classification.

View
Feature Disentanglement - I
2022-02-22
#Voice AI #Speech Representation #Deep Learning

How deep learning models can isolate independent factors of variation in data through VAEs and Beta-TCVAE, enabling controlled synthesis and better downstream representations.

View
Learning Speaker Representation with Semi-supervised Learning Approach for Speaker Profiling
2021-10-24 Shangeth Rajaa, Pham Van Tung, Chng Eng Siong arXiv preprint arXiv:2110.13653
#Voice AI #Speaker Profiling #Speech Representation #Semi-supervised Learning

A semi-supervised framework for speaker profiling that leverages external unlabelled corpora via supervised, unsupervised, and consistency training, achieving RMSE of 6.8 years on age estimation.

View
© 2026 Shangeth Rajaa.